-
Notifications
You must be signed in to change notification settings - Fork 203
Add a formal semver 2.0.0 version type #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-PR371-semver2.0
Are you sure you want to change the base?
Add a formal semver 2.0.0 version type #371
Conversation
First crack at adding a formal version type in response to CVEProject#362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
I recommend you resubmit the PR with a change in both It will be best to target a JSON schema validation instead of programmatically verifying versions when they are specific like this scenario with a clear semver-2.0.0 compliance being tested. Secondly, we should follow/extend the current schema model and extend it to satisfy this need instead of a completely new JSON schema fields like See the current versions.md document which has some examples https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md
The one we don't current have is the So your Example will actually look like
You need to build a JSON schema validator to work with such data, with versionType frozen with enum as |
Thank for the comment and I can update the json in this PR once we get to consensus 👍 With respect to the range fields themselves, after seeing you rewrite my example I think it makes sense to simplify and create new fields so that a parser doesn't need to implement conditional logic based on the combination of fields present. I think this will make for simpler and more maintainable code long term. Maybe more people can chime in on this point. As for the regex it looks like the one you're suggesting is the second of the two provided on semver.org. Albeit with a leading and trailing For documentation's sake here are the two
|
…for the expressions of "everything under X" or "everything over Y"
Had a thought hit me about one sided ranges, so I added two more examples
Which allow someone to express the idea of |
…-02-20. The status conversation will happen another day
@sei-vsarvepalli where does the |
The field |
Gotcha. Then I guess the difference between the two approaches in schema terms is to add a I've written a pretty simple parser in python for my proposal. It assumes perfect data (validated) and that the data is semver-2.0.0, but I think it gets the point across on the simplicity of parsing. Feel free to play around with it as well by changing the specific parameters in the test. I think I covered all the cases and it can probably be simplified further. import json
test_json_string = """
{
"versionType": "semver-2.0.0",
"status": "affected",
"exclusiveLowerBound": "1.2.3-alpha",
"inclusiveUpperBound": "2.3.4+build17"
}
"""
def parse_decoded_json(json):
if json.get("exactly"):
return f'= {json.get("exactly")}'
if json.get("inclusiveLowerBound"):
lower = f'{">= "+json.get("inclusiveLowerBound")}'
elif json.get("exclusiveLowerBound"):
lower = f'{"> "+json.get("exclusiveLowerBound")}'
else:
lower = ""
if json.get("inclusiveUpperBound"):
upper = f'{"<= "+json.get("inclusiveUpperBound")}'
elif json.get("exclusiveUpperBound"):
upper = f'{"< "+json.get("exclusiveUpperBound")}'
else:
upper = ""
return f'{lower}, {upper}'
the_json = json.loads(test_json_string)
print(parse_decoded_json(the_json)) I initially had lower = f'{">= "+json.get("inclusiveLowerBound") if json.get("inclusiveLowerBound") else "> "+ json.get("exclusiveLowerBound")}'
upper = f'{"<= "+json.get("inclusiveUpperBound") if json.get("inclusiveUpperBound") else "< "+ json.get("exclusiveUpperBound")}' However that doesn't handled one sided ranges and I wanted to get some code up before today's qwg meeting. I also haven't had time to make a complete comparison parser, but translating the section if json.get("exactly"):
return f'= {json.get("exactly")}' results in something that needs to look like if json.get("version") and (not json.get("lessThan") or not json.get("greaterThan") or not json.get("lessThanOrEqual")):
return f'= {json.get("version")}' as the code needs to be sure that the parameter |
@sei-vsarvepalli the new properties are in as of commit 62db169, however I'm not sure how to express the valid combinations of parameters for the semver 2.0.0 version type. Do I need to do something like a
Where the first option in the one of is the entire current payload and the other is the semver 2.0.0? Maybe you know a simpler approach? |
If this is valid then still need to ensure version type is set to semver-2.0.0 for these combinations
I let this stew for a bit and I think 046dadd is in the right direction. I think its possible to only allow those parameter combinations when the version type is semver 2.0.0, but not sure how to encode that yet. |
@sei-vsarvepalli Ok, so I'm trying to run the tests locally and it seems I need to rebuild However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file? |
What tests are you running? It looks like the starting point of your repo is Your JSON file is also mangled, the line 323 is missing a comma. When I run test against your branch I get this error
|
Thanks for pointing out the comma. Added that in. I'm trying to run the node validation suite with
Which made me think that the validation is failing to match a case on the versions section and hence looking into |
I've gone ahead and address some of the trailing commas (line numbers would help for the others 🙇) as well as the asymmetry in parameter requirements. I believe we already discussed
I believe you meant to use If you think cve services should provide range checking I'd love to work with you on building that out 👍
Oh, good catch. For what its worth it looks to me like versions can already mismatch today too. eg.
So, I asked back here #371 (comment) if a reference implementation would be helpful and it seems like maybe it would be. I do wonder if the cve website could simply display the versions as string though as I believe that's how current versions are handled.
I'm not in love with that, but I could be open to it. I'd like to get broader consensus before entertaining the idea. |
Regarding having CVE Services check version bounds, since it's not possible within the schema constraints: in the Package URL proposal we've recently agreed that CVE Services would be responsible for validating Package URLs, since Package URL parsing is too complex to constrain in a regex inside the schema. I think it's fine that some constraints end up in CVE Services when they can't be done in the schema. |
I had intentionally used
but either one is valid in the proposed schema. This isn't an inherited problem. There are now two properties that have the same meaning in this context ( More importantly, a semver-2.0.0 data producer needs to be aware of:
I believe rule 2 is too ridiculous and we shouldn't ship a schema with that behavior, because the support costs would be too high. Rule 3 had also been harmful to data integrity, because it conflates the concepts of "don't know" with "a version named 0.0.0-0 existed and was vulnerable." |
To find the remaining trailing commas without local tools, one can use websites such as jsonlint.com
It only identifies one of the trailing commas at a time. I don't know how many remain (there's at least one). |
I'll get to the rest of the trailing commas later today. Thanks for the tool 👍
To address these point by point
|
For 'How would you feel about a construction where lower bounds always use lessThan/lessThanOrEqual and upper bounds always use greaterThan/greaterThanOrEqual?': I am opposed to this for the 5.x version series of the CVE Record Format. Consumers today can use the Writing a reference implementation of different behavior is not sufficient. If we change the behavior at some future point (in favor of greaterThanOrEqual or other new properties), then we need a communication plan that can effectively reach consumers, and a substantial period of time for consumers to adapt their use cases to new business logic. This would typically be announced as a substantial update, one with breaking changes for I believe the correct approach to SemVer is along the lines of what I originally suggested in 2023 at #263 - that:
I tried to extract every string from every current CVE Record that is intended to be a SemVer version number, and then I compared them to the SemVer 1 regular expression and to the SemVer 2.0.0 regular expression. The result was that 1.4% (see below) of these version numbers were valid for SemVer 1 but not valid for SemVer 2.0.0. A reasonable assessment is that SemVer 2.0.0 is sufficient for the community's needs, and should be what "semver" means in the CVE Record Format going forward. We should not be considering a complex and controversial semver-2.0.0 proposal to address a 1.4% case.
|
There was some discussion in the QWG today about a schema in which there was always a
With this schema, it is valid to write:
(i.e., no |
@ElectricNroff can we dig in on a concern I may not be fully understanding? Let's use the json you just posted
I see this as an encoding of the expression
Give me a bit more time to parse your longer post above. |
Yes, I want the Just having
if you ignore |
I think these two sentences are at odds with each other (at least as I read them). If the meaning of |
For your larger post, I think we touched on most of those points in the QWG meeting yesterday and noted that the concern is less about the small diff from semver 1 to semver 2 and more about the general inconsistency. Also curious where you found the semver 1 regex. If you think there's a point in there I'm glossing over that I shouldn't be or that wasn't addressed synchronously please call it out. For the new construction what do you think about this. We keep.
and in the case that
and I'll propose two special cases which would otherwise be invalid ranges so that we can capture
and
with the understanding that
and
This gives us all of our normal mathematical range tools aside from a non-inclusive lower bound With this construction the |
I don't think that typical SemVer implementations would consider it invalid to check whether an observed version number is less than 0.0.0. Consequently, they have no innate knowledge that a range that ends in 0.0.0 is invalid. They would just do the math as defined by the SemVer specification, and conclude that any observed version is simply not inside the range, Therefore, this is a breaking change because all such code would need to be changed. Here is one example of SemVer comparison within an Open Source vulnerability scanning product: |
Indeed it is atypical. It was designed to meet your requirement of the
They have no innate knowledge of anything today. Please see: #362
We have yet to define what is and is not a breaking change 👍. Please see #418 |
At the request of the QWG meeting today here are the other two constructions. For ease of readability I'll be breaking these into two posts. The first which was first introduced here e637776 and which was designed to be completely new so that existing parsers would be the least likely to misinterpret the new data. Discussed back around this comment #371 (comment) Five new properties are introduced. Which would allow the construction of The singleton
Two sided ranges
One sided ranges
The design is primarily for machines, but I think the wording choice also makes it easy for an uninitiated human with a basic mathematics education to understand the raw data in a pinch. The use of completely new properties is to avoid any interpretation conflict with existing parsers. The choice of breaking out |
The second choice which was introduced in a72e5b8 to "avoid bloat" by request Two new properties are introduced. Which are then used in expressions as
Two sided ranges
Two sided ranges with exclusive lower bounds were not implemented and it's unclear how to cleanly implement them with the restrictions that were imposed on this implementation. One could consider something like
and omit One sided ranges
In retrospect I view this construction as something of a halfway house and as such is my least favorable option. The third construction here #371 (comment) |
So in the interests of having current state at the end of this very long comment trail, what does that mean for the path forward? |
@andrewpollock That's a question for the QWG chair's @david-waltermire, @ccoffin, @MrMegaZone and potentially the board. I see Dave thumbs up'd the original construction |
How would I know which specific versions are between 2.0.0 and 2.5.7 using this approach? { |
They would be the versions |
Got it. So the software producer will know which versions are in the set, but how would a consumer know which versions are in the set. |
They would check for whichever version(s) of interest are relevant. Edit: If the question you're asking is more along the lines of
|
Got it, so if they are running version 2.3.4-beta this would be in the set because it's between 2.0.0 and 2.5.7 |
Right. By the rules of semver
after this PR merges at very least. I have no idea if this is even on their radar though to be honest. If there's a tooling related question |
@rjb4standards this is distinct from NIST's NVD search API. The Record Format (what we're discussing here), is managed by the CVE project. NVD, maintained by NIST, is a downstream consumer of CVE data. So even if/when this proposal for a new version type is added to CVE, it'll be up to NVD what to do about it. |
@alilleybrinker Thanks for clarifying. Does this mean the CVE Foundation will have a searchable API that supports the sem version range? For example, show me all the CVE's for ACME 2.3.4-beta? |
@rjb4standards the CVE Foundation is different from the CVE Project. As for what the CVE Project would do for improving search, I recommend talking to the Automation Working Group and/or the Consumer Working Group. You can find out more about the groups here: https://www.cve.org/ProgramOrganization/WorkingGroups |
@alilleybrinker thanks for clarifying. The CVE space is getting very confusing with the looming funding deadline coming fast. |
First crack at adding a formal version type in response to #362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic
Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
Another thought is that maybe this should be a retroactive definition of the
semver
type. That would likely be breaking for some of the current records though.The goal here is to have strict validation provided by cve services